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Resampling  Statistics 
for  the  F-22A  Lot  5 
Suitabiiity  Anaiysis 


Mr.  Juan  P  Perez 
59  TES/EAA 


UNCLASSIFIED 


59*^  TES  Operations 


59*'^  TES  in  conjunction  with  422"^^  TES 

>  Perform  Operational  Test  &  Evaluation  (OT&E) 

>  Tactics  Development 

>  Fielding  Recommendations 

Weapon  Systems 

>  A-10,  F-15C,  F-15E,  F-16,  F-22A,  HH-60 


Purpose 

Explain  the  problem 

Demonstrate  the  Resampling  Technique 

>  Construct  the  Cumulative  Distribution  Function  (CDF) 

>  Build  a  Confidence  Interval 

>  Calculate  Power/Consumer  Risk 

Demonstrate  application  tool  used  for  the  suitability 
analysis 


❖Test  Problem 
❖Background 
❖Test  Objective 
❖Test  Methodology 
❖Reporting 
❖Conclusion 


Overview 


Test  Problem: 
Lot  5  System  Description 


Advanced  Radar  Program 
PIP  B171  &  DO-03 


IFDL  Controller 
PIP212 


DEW 
PIP209A 
New  L5  S/W 


Aux  Com/ACP 
PIP211  (Lot6H/W) 
S/W  only  Lot  5 


Radar 

Antenna 


Functionality 
Equivalent  to  Lot 
3  Production 
Aircraft 


Radar 

Receiver 

Exciter 


MLD  Sensor 
PIP238FGA 


IFDL  RT 
PIP212 


DMVR 

DMS 


Radar 

Processor 


Enhance  radar’s  future  capabilities 


•New  contractor  building  “same”  parts 


Test  Problem: 
F-22A  Lot  5  Suitability  Analysis 


Test’s  Purpose:  To  compare  currently  fielded  aircraft 

suitability  results  to  Lot  5  aircraft. 

>  Operational  Utility  Evaluation  (OUE):  compare 
effectiveness  and  suitability  between  F-22A  Lot  5  aircraft 
(new  data  set)  and  currently  fielded  F-22A  aircraft 
(baseline  data  set) 

>  Suitability  goal:  evaluate  Lot  5  hardware  updates  only 

>  Suitability  structure:  compare  data  using  Reliability, 
Maintainability  and  Availability  (RM&A). 


Test  Problem: 
F-22A  Lot  5  Suitability  Analysis 


Analyst’s  Challenge:  How  to  compare  current 
suitability  results  to  new  Lot  5  suitability  measures? 

>  Baseline  data  (currently  fielded  aircraft)  includes  758 
flight  hours.  Lot  5  test  has  shorter  timeline 
requirements. 

>  The  test  requires  a  minimum  70%  power  /  30% 
consumer  risk  in  detecting  at  least  a  doubling  (twice  as 
bad)  in  any  suitability  measure. 

>  The  new  test  consists  of  200  flight  hours 


❖Test  Problem 
❖Background 
❖Test  Objective 
❖Test  Methodology 
❖Reporting 
❖Conclusion 


Overview 


Background: 
Suitability 

Operational  suitability:  the  degree  to  which  a  system  can 
be  satisfactorily  placed  into  fielded  use 

>  reliability,  maintainability,  availability  (RM&A) 

Availability:  affected  by  reliability  and  maintainability 

Other  considerations:  compatibility,  transportability, 
wartime  usage  rates,  safety,  human  factors,  manpower 
supportability,  logistics  supportability,  documentation, 
training  requirements,  and  natural  environmental  effects 
and  impacts 


Background: 
Suitability 

Reliability:  the  probability  of  a  system  to  perform  its 
function  p, 

>  Time-continuous:  operate  the  system  until  it  fails,  fix  it  and  continue 
to  operate.  Process  repeats  untii  enough  information  is  coiiected 

>  Success/Faii:  test  the  system  and  record  successes  and  faiiures 

Maintainability:  the  ability  of  an  item  to  be  retained  in,  or 
restored  to,  a  specified  conditions  when  maintenance  is 
performed  (API  10-602) 

>  Average  time  between  maintenance  activity 

>  Cumuiative  maintenance  time  divided  by  flying  hours 

Availability:  the  probability  that  a  system  will  be  in  an 
operable  state  at  a  random  point  in  time 

>  Operational  time  divided  by  the  total  time 


Background: 
Suitability  Analysis 


Reliability,  Maintainability  and  Availability  Evaluation 

>  New  Lot  5  hardware  data  collection  only 

>  Test  scoring  data  Measures  of  Performance  (MOP): 

Break  rate 

Mean  time  between  critical  failure  (MTBCF) 

^  Could  not  duplicate  rate 
^  Maintenanc^^jj^qurs  peTflidihflliigihirHours 
2/4/8  hour  fix  rate  Number  of  Critical  Failures 
Abort  rate 

Mean  time  between  maintenance 
Weapons  system  reliability 
Integrated  diagnostics  accuracy 
Mean  down  time 
Mean  repair  time 


Background: 
Resampling 

What  is  Resampling? 

*>  Mechanism  used  to  produce  a  hypothetical  distribution  by  randomly 
taking  samples  from  an  observed  distribution  or  baseline  distribution 

❖  Essentially:  Monte  Carlo  simulation  of  statistical  results 

*>  Basic  Rules: 

1.  Specify  the  universe  to  sample  from 

observed  data  set  or  baseline  data  set 

2.  Specify  the  sampling  procedure 

number  of  samples 
^  sizes  of  samples 

sampling  with  or  without  replacement 

3.  Specify  the  statistic  you  wish  to  keep  track  of 

ratio,  mean,  variance,  etc. 


Background: 

Resampling 


Advantages 

>  Simple  to  use  and  teach 

>  Avoids  using  the  wrong  method 

>  Knowledge  of  distribution  not  needed 

>  Free  of  mathematical  formulas  and  restrictive  assumptions 


Disadvantages 

>  Requires  empirical  data 

>  Sample  size  can  be  a  problem  if  the  baseline  data  is  limited 


❖Test 

❖Background 
❖Test  Objective 
❖Test  Methodology 
❖Reporting 
❖Conclusion 


Overview 


Test  Objective: 
F-22A  Lot  5  Suitability  Analysis 

Assess  operational  effectiveness  of  new  Lot  5  hardware 

Compare  Lot  5  hardware  and  equivalent  hardware  for 
currently  fielded  aircraft  suitability 

>  Report  results  outside  a  90%  confidence  bound  as 
significantly  different 

>  If  not  significantly  different,  report  power/consumer  risk 

>  The  formal  hypothesis  for  each  MOP  for  this  approach  is 
as  follows: 


H  :  Lot  5  MOP  >  FDE  MOP 

O 

:  Lot  5  MOP  <  FDE  MOP  (a  =  .1) 


**  Note:  hypothesis  assumes  that  a  larger  value  is  better 


Test  Objective: 
Algorithm 


Repeat  flow  for  each  suitability  MOP 


❖Test 

❖Background 
❖Test  Objective 
❖Test  Methodology 
❖Reporting 
❖Conclusion 


Overview 


Test  Methodology: 

Example 

Mean  Time  Between  Critical  Failure  (MTBCF) 

MTBCF  =  Total  Flight  Hours 

Number  of  Critical  Failures 

❖  The  suitability  study  for  currently  fielded  aircraft  recorded  18  critical 
failures  (CF)  during  758  hours  of  operation  fora  MTBCF  of  42.11 

❖  Create  a  representative  population  using  758  binary  values 

>  Assume  1  sample  per  flying  hour 

>  Assign  a ‘1’ to  CFs  (18  occurrences) 

>  Assign  a  ‘0’  to  no  CFs  (740  occurrences) 

❖  Use  resampling  to  produce  a  200  hour  representative  (hypothetical) 
population 

❖  Construct  a  CDF  of  the  200  hour  MTBCF 

❖  Use  the  CDF  of  the  200  hour  data  to  find  the  lower  90%  confidence 
bound 


Test  Methodology: 

Test  Model 


(c 

^  RgHm^BrisStaaB^^ 

♦iMNOw  we  nave  a  hypotneticar 


Sa 


^ouwant  to  score. 

Use  the  Ctrl  key  to  enter  non-contigoous  cells.  You  may  also  type 
in  cell  or  range  addresses.  Use  the  comma  (,)  to  separate 
individual  or  non-contiguous  cells  or  ranges,  Use  the  /  to  separate 
sheets. 
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-  Output  sheet  - 
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Test  Methodology: 
Confidence  Interval 


Compare  combined 
new  Lot  5  hardware  only 
to  fielded  equivalent 
combined  hardware 
only 


FDE  MTBCF  Confidence  Interval 
(For  200  Flying  Hours) 

For  example,  let’s  say  that 


100% 

90% 

80% 

70% 

.60% 


No,  Report  Power/ConsurneJ^RisK 


40% 

30% 

20% 

10% 

0% 


Compare  each  sub¬ 
system  and  provide  a 
detail  report  of  what 

caused  Confidence  bound  is  25 


Lot  5|MIBCh=33 


lort  Rdsults 


y 


0  10  20  30 


Bgf^ed  on  equivalent 
'  results 


Find  the  90%  confidence 
bound  of  the  ^^9'^ 


40  50  60  70  80  90  100  110  120  130  140  150  160  170  180  190  200 

Mean  Time  Between  Critical  Failures 


33  is  in  the  90%  of  the  data,  therefore  there  is  no  significant 
difference  between  Lot  5  and  the  currently  fielded  hardware 


Test  Methodology: 
MTBCF  Power 


CDF  constructed  using  resampling  statistics 

Use  resampled  results  to  calculate  power 

Power  indicates  the  confidence  level  the  test  design 
Drovides  to  detect  a  particular  level  of  increase  in  the 
VITBCF  based  on  possible  results 


Nuil  hypothesis  is  True 

Nuil  hypothesis  is  faise 

Reject  the  null  hypothesis 

Type  1  error  (rejecting  a 
true  nuli  hypothesis)  a 

Correct  decision 

Fail  to  reject  the  null 
hypothesis 

Correct  decision 

Type  II  error  (failing  to 
reject  a  faise  nuil 
hypothesis)  p 

Test  Methodology: 
Power  (Classical  Method) 


❖  Classical  method:  theory  based 

❖  Resampling  method:  observation  based 

❖  An  example  to  demonstrate  the  classical  method 

>  Measuring  successes  and  failures,  therefore  use  the  binomial 
distribution 

>  Probability  Mass  Function:  PMF 

The  binomdist  excel  function  provides  the  CDF/PMF  of  the  binomial 
distribution 

Function  ^  BINOMDIST(#  success,  trials,  probability  of  success, 
cumulative) 

Number  of  success  ^  (Trials-CF) 

Trials  ^  Total  Flight  Hours 
Probability  ^  (Trials-CF)/Trials 

Cumulative  ^  TRUE  provides  the  CDF;  FALSE  provides  the  PMF 


Test  Methodology: 
Power  (Classical  Method) 


The  PMF  (blue)  was  generated  based  on  the  known 
baseline(758FH  with  18  CFs) 


The  red  PMF  -  Distribution  is  twice  as  bad  as  the  main  baseline 


The  blue  PMF  -  Distribution  is  three  times  a  bad  as  the  main 
baseline 


Any  results  outside  the  90%  (lower  than  28.57  MTBCF)  will  be 
considered  significantly  different. 


Test  Methodology: 
MTBCF  Power  (Resampling) 
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Overview 


Report  Results 


Compare  combined 
new  Lot  5  hardware  only 
to  fielded  equivalent 
combined  hardware 
only 


Baseline  MTBCF  of  42.11  FH  (758 
hours  over  18  critical  failures) 

We  expect  5  critical  failures  over  200 
hours 


No,  Report  Power/Consumer  ffifeklF  the  oew  Lot  5  hardware  provIded  a 


❖ 


❖ 


PiT^Ct^of'^3  FH  (6  critical  failures 


2(30  hours) 


Yes 


Compare  each  sub¬ 
system  of  new  Lot  5 
hardware  only  to  fielded 
equivalent  sub-system 
hardware  only 


>86°/^)  power  (14%  consumer  risk) 


that  tHi 

doubi 

hours) 


e  true  Lot  5  MTBCF  did  not 
j  (10  critical  failures  over  200 


>99%  power  (1%  consumer  risk)  that 
the  true  Lot  5  MTBCF  did  not  triple 
(15  critical  failures  over  200  hours) 


❖Test 

❖Background 
❖Test  Design 
❖Test  Methodology 
❖Reporting 
❖Conclusion 


Overview 


Conclusion 


Resampling  Statistics  makes  possible: 

>  Construction  of  CDFs 

>  Building  Confidence  Intervals 

>  Calculating  Power/Consumer  Risk 

❖  Resampling  Statistics  with  F-22A  Lot  5  suitability: 

>  Facilitates  comparison  between  large  baseline  sample  size  and 
small  test  sample  size 

>  Provides  method  to  compute  power 

>  Evaluates  data  without  knowledge  of  the  distribution 

>  Eliminates  mathematical  formulas  and  restrictive  assumptions 
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